-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change UnionArray
constructors
#5623
Conversation
In light of #5368 I think I would like to sit on this for a bit, and we can look to get this bundled in when we next make a breaking release |
I took the liberty of merging up main, making some minor tweaks to avoid hashmaps, and adding some additional tests. I think if you're happy this is then ready to go in (the next release is going to be breaking) |
Why would we want to avoid hashmaps/sets here? It seems the negative |
They're slower, given there are only 128 possibilities and they'll almost always be clustered towards the lower end, it is faster just to use a vec
Would you care to expand on this? |
Is there a benchmark? I'm wondering how much slower.
7cd47de#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR191-R199 |
The check got moved to here - a69cbac#diff-7e118aa8aabb6fbbb2b5dcd508dd5b3219300bcb829c6cbef6cf35146320806aR209
No, but I would expect it to be quite significant. You are comparing the performance of a pointer indirection to a hashmap lookup. Given this check is done for every type in the array, I would expect it to be quite significant |
Which issue does this PR close?
Closes #5613.
Rationale for this change
It came up in #5585 that the
UnionArray
constructors are a bit odd when compared to the other array constructors.What changes are included in this PR?
UnionArray::try_new
andUnionArray::new_unchecked
Changed arguments:
From:
field_type_ids: &[i8], type_ids: Buffer, value_offsets: Option<Buffer>, child_arrays: Vec<(Field, ArrayRef)>
To:
fields: UnionFields, type_ids: ScalarBuffer<i8>, offsets: Option<ScalarBuffer<i32>>, children: Vec<ArrayRef>
.The order of
children
arrays must match the order of fields in theUnionFields
.The validation in
try_new
is a bit more strict.UnionBuilder
It now uses a
BTreeMap
instead of aHashMap
internally to get sorted iteration.Are there any user-facing changes?
This is a breaking change that changes the
UnionArray
constructors.